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Introduction 


he past few years in state assessment have been rough. The decade began with 
the Obama administration’s Race to the Top Assessment (RTT-A) grant, which 
funded states to develop higher-quality and more rigorous assessments aligned 
to the newly adopted Common Core Standards in math and reading.* Two multistate 
consortia focused on math and reading assessment kicked off their work around 2010: 
the Partnership for Assessment of Readiness for College and Careers (PARCC) and 
the Smarter Balanced Assessment Consortium (SBAC).? 45 states initially signed on. 
The consortia tests achieved many of their aims. Multiple studies have found that the 
tests align well to college- and career-ready standards, giving educators, families, 
and policymakers a more honest measure of student performance and progress.’ The 
consortia also pushed the field forward on test administration and test development 
processes, including technological innovations that were big advancements from 
most tests that came before.* And comparable student test results across states were 
available to millions of families for the first time. Despite these achievements, by the 
time new tests debuted in 2015 they already faced intense backlash from political actors 
and later the public.° Today 12 states remain in the Smarter Balanced consortia and 
PARCC has essentially disbanded, although several states still administer the PARCC 
test or use PARCC items in their new tests.° 


There were various factors behind the backlash, most of them unrelated to the quality 
or specific features of states’ new tests. Some teachers and families pushed back on time 
spent testing and the perceived high stakes tied to tests, especially in states intending to 


Current wisdom holds 
that testing has become 
politically toxic. There 
are real risks that some 
states are rolling back 
advancements in test 
quality, accessibility, 
and rigor in the name 
of reducing test time 
and cost, or answering 


political pressures. 


use test scores as a component in teacher evaluations.’ Computer-based tests spurred 
investments in school technology, but they also introduced new administrative hurdles 
and costs. Although annual state tests had been federally required since 2001, and 

the consortia and standards were led by states, the new tests became a focal point of 


narratives about federal overreach and over-testing. 


Current wisdom holds that testing has become politically toxic. There are real risks that 
some states are rolling back advancements in test quality, accessibility, and rigor in the 


name of reducing test time and cost, or answering political pressures. 


But that is not the whole story of state assessment today. In fact, there are several areas 
with evidence of improvement, innovation, or interesting new developments, several 

of which go beyond states’ federally mandated role in testing. States are continuing 

to rethink their roles in assessment and their assessment systems in ways that may 
benefit teaching, learning, transparency, and equity. One of our primary goals for this 
brief is to inform and reinvigorate public conversation around assessment innovation 

at the state level, especially among education policy leaders who care deeply about 
educational progress, equity, and innovation, but might not see much cause for optimism 


or investment in state assessment. 


One encouraging high-level trend in state assessments is an increasing emphasis 

on instructional relevance and resources that can help bring standards to life in the 
classroom, often as a complement to summative tests. Whereas once the state role in 
assessment was almost entirely limited to developing and administering traditional 
summative tests, states are thinking about ways to build more comprehensive 
assessment systems that include different kinds of tests and align with parallel efforts to 


improve instruction, professional development, standards, and curriculum. 


New ideas in assessment may pick up steam with the help of $379 million in competitive 
federal grants for assessment innovation, announced in late January 2019.8 The 

priorities for this program include interim assessments; science, technology, engineering 
and math (STEM) assessments; and tests that incorporate new kinds of technical designs 


or project-based learning. 


This brief highlights developing trends and opportunities for state systems of 
assessment, especially in areas beyond federally mandated reading and math 
assessments. Which states are pursuing these ideas, and what might be holding others 
back? What are the risks and rewards of investment and innovation in new test designs 


or assessments? 


The State of Assessment [5] 


[6] 


Bellwether Education Partners 


In particular we discuss the landscape of: 

@ Interim assessments for accountability 

Q Formative assessments to support instruction 
3] Interstate collaborations and shared item banks 


4) Science and social studies assessments 


These four topics do not represent the totality of progress and innovation in assessment. 
We chose them because they are each examples where innovation and improvement is 
within close reach for states, and where some states are already leading in ways that may 
be instructive. This brief is not a comprehensive technical review, and for assessment 


professionals steeped in test design and administration, some topics may be familiar. 


All of these ideas come with potential risks, but all are worthy of deeper investigation, 
discussion, and research, towards the goal of high-quality assessment systems that support 
equity and transparency while at the same time advancing teaching and learning. Our hope 
is that as a few states push beyond the status quo and show how new assessment ideas can 
work in the real world, more states will follow suit and create better assessment systems 
that support improved outcomes for students. 


Glossary 


Accountability When we refer to accountability in this brief, we mean the system of measures, interventions, 
and supports for schools created by every state to meet the mandates of the Every Student Succeeds Act. 


Computer Adaptive Assessment A computer adaptive assessment, or adaptive test, uses an algorithm to 
select each item for individual students dynamically based on the answers to prior questions. This kind of 
test design can allow for greater precision in measuring students’ skills in a shorter time, because it hones in 
on areas where students are stronger or weaker. 


Formative Assessment A planned, ongoing process used during teaching and learning to elicit and use 
evidence of student learning to improve student understanding.? These small-scale assessments (e.g., 
homework, exit tickets, curriculum-embedded activities) have an explicit purpose to diagnose and advance 
student knowledge and skills during learning.*° Formative assessments are designed to play an active role in 
instruction, and are typically not appropriate for aggregation or use for accountability. 


Interim Assessment A method of evaluation to measure students’ knowledge and skills within a limited 
time frame (less than a school year). Used to inform decision-making at the classroom level and beyond. 
Interim assessments may be administered at set intervals, or directly after a student has learned or 
demonstrated knowledge in a subject area (competency-based assessments). 


In this brief we explore states using multiple interims throughout an academic year for accountability 
purposes, replacing summative assessments." Interim assessments can also be used as part of a formative 
assessment process, described below. 


Performance assessment, or performance-based tasks Assessments that ask students to perform a task 
or generate a response that engages and authentically demonstrates their knowledge and skills on a topic.” 
Performance assessments can be short (a writing exercise) or long (a semester-long project). These kinds of 
assessments can be used for formative or summative purposes (or both). 


Reliability A reliable test produces results that are consistent across students and time. If two students 
with the exact same level of knowledge took a reliable test, they should get the same score. A student with 
less knowledge should get a lower score, and a student with more knowledge should get a higher score. 


Shared Item Bank A database of curated assessment items maintained by a collaborative (of states, 
districts, organizations) that can be shared and administered by multiple parties. 


Summative Assessment Any method of evaluation that measures students’ cumulative learning against 
standard criteria over a set period, typically over the span of the course or school year.*8 State summative 
test results are used in a variety of ways in different states and districts — as a comparable measure of 
student performance across schools, districts, and years; as part of state and local accountability systems 
to measure school performance; as a component of teacher evaluations, and in some cases as a graduation 
requirement or benchmark measure for students. 


Test Item Another term for a question or task on a test. 


Validity In educational assessment, validity refers to the accuracy of assessment results or a specific test 
item — whether a test accurately measures what it purports to measure, and whether the test results can 
support a certain use or interpretation (for example, whether a score on a state test accurately reflects 
student knowledge of a topic relative to their grade level, and can be used to make instructional choices for 
the student). The standards of validity vary depending on the intended uses of test results — a test that is 
used to determine whether a student graduates should have a different standard of validity than a test used 
to determine which students will work together on a class project. 
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Innovative Assessment Demonstration Authority 


One heavily discussed topic in assessment innovation is the Innovative Assessment Demonstration Authority pilot 
program (IADA) created under the Every Student Succeeds Act (ESSA).' This program allows up to seven states to 

pilot innovative state assessment models that vary from some (but not all) federal requirements for testing. States 

that want to participate in the pilot must ensure that their proposed assessments result in an annual summative score 
or proficiency determination for each student; are valid and reliable for all students, including English learners and 
students in special education; are comparable across districts; include a representative sample of students from around 
the state; and may be scaled statewide within three years. As of May 2019, the Department of Education has approved 


proposals from Louisiana and New Hampshire. Proposals from Georgia and North Carolina are under consideration. 


When IADA was created as part of ESSA, there was considerable speculation on what states might do with increased 
flexibility from federal assessment requirements, especially since some state chiefs had publicly criticized ESSA’s 
testing mandates." Some education analysts and assessment experts have expressed disappointment that more 
states did not pursue this opportunity.’ Others point to the technical constraints and requirements of the pilot, as 


well as the lack of additional federal funds associated directly with the waivers. 


Among the four states that have applied so far, there are not many clear trends or commonalities. Because several 

of the ideas from IADA applications intersect with other trends of interest in this brief, such as interim assessments 
for accountability or social studies, Louisiana, North Carolina, and New Hampshire are discussed in greater detail 
below. But states do not have to participate in [ADA pilots and submit to increased federal scrutiny in order to pursue 
innovation and improvement. The pilot program is a potential vehicle for innovative ideas, not an innovation on its 


own. Thus, a slow takeoff to the IADA program is not necessarily indicative of the outlook for assessment innovation. 


i Office of Elementary and Secondary Education, “Innovative Assessment Demonstration Authority,’ US Department of 
Education, https://www2.ed.gov/admins/lead/account/iada/index.html. 


ii Alyson Klein, “How Will ESSA’s Innovative Assessment Pilot Work?,’ Education Week, June 30, 2016, 
http://blogs.edweek.org/edweek/campaign-k-12/2016/06/how_will_essas_innovative_asse.html. 


iii Alyson Klein, “What Happens When States Un-Standardize Tests?,’ Education Week, October 11, 2018, 
https://www.edweek.org/ew/articles/2018/10/11/what-happens-when-state-un-standardize-tests.html. 
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Interim Assessments for Accountability 


ight now, most students take their state exams once per year, and results might not 

come back to schools and families until after the start of the next school year.** If one 

of the goals for state summative tests is to inform instruction and school decision- 
making, this means students do not get timely feedback on their performance, and by the 
time school leaders and teachers see last year’s scores, instruction for the next year is 
already underway. Practically, this means the cycle by which state assessments can inform 


instruction and school decisions is slow and limited. 


Achance to alter this model is already embedded in federal education law, but few 

states seem interested. ESSA invites states to replace their end-of-year summative 
assessments with assessments administered over the course of the year, also called interim 
assessments. States’ assessments may “be administered through multiple statewide 
interim assessments” to provide “valid, reliable, and transparent information on student 


achievement or growth.’ 


Additionally, two of the priority areas for competitive federal assessment grants 

announced early in 2019 include “the development of comprehensive academic assessment 
instruments ... that emphasize the mastery of standards and aligned competencies in a 
competency-based education model,’ and “measuring student academic achievement using 
multiple measures from multiple sources.’** Various models of interim assessments for 


accountability could meet one or both of these priorities. 
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Interim assessments for 
accountability would be 
mandatory, and would 

replace state tests in an 


accountability system. 
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Many schools and districts already use interim assessments for their own benchmarking 
and instructional planning purposes, either with state-provided tests, tests purchased from 
vendors, or tests designed on their own. In contrast, interim assessments for accountability 


would be mandatory, and would replace state tests in an accountability system. 


Benefits 


Interim assessments for accountability offer some strong potential benefits for states, 
teachers, and schools. Teachers could receive actionable feedback and adjust instruction 
over the course of the year. School districts could save money if they no longer need to 
purchase supplementary interim assessments, and states could be assured of the quality of 
tests schools use during the year. Interim assessments are usually shorter, and because of 
this they may be easier to integrate into schools’ day-to-day schedules. This could discourage 


weeks of pretest preparation and a long period of “testing mode” near the end of the year. 


Some states are also seeking to influence instructional practices. If each interim assessment 
covers fewer standards at a deeper level, it could encourage more depth of instruction in 
the classroom. More flexible testing models could potentially encourage the adoption of 
personalized or competency-based instruction, because it could allow students and/or 


schools to demonstrate proficiency in a particular subset of standards on different timelines. 


Risks 


The technical, political, and logistical challenges of designing and implementing this kind 
of assessment are considerable. First, influencing curriculum, scope, and sequence might 
be part of states’ goals for this work, but it may also generate opposition. States have 
historically been agnostic about when during the academic year students should be able 
to master standards, but if states design interim assessments to test subsets of standards 
at particular intervals, it could reshape the pacing and ordering of classroom curriculum to 
align with the tests. This could be part of a state’s goals for interim assessments, but it may 


also provoke pushback in the name of local control. 


Amore flexible, competency-based interim test design would have less of an impact 

on curriculum pacing, but could pose different risks to comparability of test data for 
accountability. If two schools administer components of the assessment at different points 
of the year, after different amounts of instructional time, in different orders, can students’ 
scores be fairly compared??? When there are high stakes linked to an assessment, small 


degrees of variation or potential for bias can have a big effect on reliability and validity. 


Logistically, interim assessments for accountability may actually take up more combined 
time and resources than traditional summative assessments. States will need to increase 


their capacity to provide nearly year-round assessment administration support and schools 


In order to see success 
with this model, 

states must be willing 
to put some kind of 
stake in the ground 
around curriculum 
and sequencing, and 
work with districts and 
teachers to ensure that 
results are both useful 


and reliable. 


will need to adapt to a new mode of testing in their teaching and in their administrative 
procedures.'® Some districts may choose to stick with their own interim assessments in 


addition to the state test, creating redundancy. 


In order to see success with this model, states must be willing to put some kind of stake 
inthe ground around curriculum and sequencing, and work with districts and teachers to 
ensure that results are both useful and reliable. Interim assessments have the potential 
to provide data to teachers on a faster timeline and reduce costs for districts, but state 
policymakers should not forget that teachers and administrators will need additional 


supports and guidance to use these new kinds of tests and the results they generate. 


State Examples 
Nebraska 


Nebraska is currently planning to develop interim assessments for accountability purposes 
in grades 3-8.” Nebraska aims to give students the opportunity to interact with their own 
data and monitor their growth throughout the year, and to provide teachers with real-time 
data to improve classroom instruction.”° This system will include an adaptive end-of-year 
test and interim assessments in grades 3-8, while high school students will continue to take 
the ACT test.*! Starting in 2017-18, Nebraska partnered with NWEA to develop the new 
test and offer MAP Growth optionally to all schools. MAP Growth is an adaptive interim 
assessment used by schools in many states across the country, which measures student 
skills across grade levels.2* Although it is optional, all Nebraska school districts now use 
MAP Growth. As a next step, the state will develop a new state adaptive interim assessment 
system, which will measure student performance against grade-level expectations for the 


purposes of accountability requirements, and replace the current end-of-year test.?° 
Louisiana 


Louisiana was the first state to win approval under the Department of Education’s IADA 
pilot for anew humanities assessment, which will measure students’ understanding of 
English language arts (ELA) and social studies content with interim assessments.” This 

new test will align with content in Louisiana’s ELA Guidebooks curriculum, an optional 
resource aligned with college- and career-ready standards developed by the state.”° Several 
brief assessments including both ELA and social students content will be administered 
throughout the year, rather than in end-of-year ELA and social studies exams. Tests will 
focus on books or passages that students have studied in class.*° As a result, state leaders 
hope that teachers can focus instruction on important background knowledge and making 
meaning of full texts, and students will have equitable opportunities to develop knowledge 


about topics covered on the assessments and in the standards.?” 
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North Carolina 


North Carolina recently submitted an IADA pilot application to expand an existing interim 
assessment approach, called NC Check-Ins, into a system that could generate an end-of- 
year score and be used for accountability.2® These assessments would be administered 
three to four times per year in flexible test windows, and use items directly drawn from 
North Carolina’s existing end-of-year tests. Under the proposal, the state would still 
maintain an end-of-year test for students who missed earlier interim tests, and potentially 
as aretest option for students who do not demonstrate proficiency in the interim 
assessments. Reducing test time and increasing transparency for students, teachers, and 
families are among the state’s top goals for this proposal. North Carolina’s application is still 
under consideration as of April 2019, and federal reviewers requested more information 
from the state on a variety of topics, including alignment between the interim assessment 
and state standards, how the state will arrive at asummative score based on interim tests, 


and plans for eventually scaling the pilot statewide.’ 


Sidebar 2 


Innovation in Assessments for Special Populations 


Federal grant programs in the past decade not only funded PARCC and Smarter Balanced — they supported four 
additional multistate consortia focused on higher-quality assessments for specific student groups. DLM and MSAA/ 
NCSC created new alternate assessments for students with significant cognitive disabilities. WIDA ACCESS and 
ELPA21 created new assessments of English language proficiency. Many states still administer all of the above 
tests, and they have not experienced the politicization and backlash around the higher-profile math and reading 


assessments. 


This brief does not focus on these tests in detail, but one of the priorities of the new competitive federal assessment 
innovation grants is innovation in tests for both special education students and English learners. With these 
additional resources, it is possible that assessments for special populations of students could be a new frontier in 


assessment innovation and advancement. 
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One of the things state 
leaders heard loudly from 
the pushback to testing 
was that state summative 


assessments were rarely 


useful to teachers directly. 


Formative Assessments to Support Instruction 


n increasing number of states are using guidance, training, and incentives to shape 

formative assessments for instructional purposes. Formative assessments are a 

planned, ongoing process of low-stakes, informal assessment used by students and 
teachers during learning.°° Formative assessments are not separate from instruction, but 
are opportunities for students to practice what they have learned during commonplace 
classroom activities and opportunities for teachers to adjust their instruction based on 
quick analysis of student performance.** 


These kinds of lower-stakes assessments have not traditionally been a part of states’ 
assessment purview, as they do not generate data that is appropriate for use in 
accountability or for aggregation. But one of the things state leaders heard loudly from the 
pushback to testing was that state summative assessments were rarely useful to teachers 
directly. Other evidence suggested that many schools lacked high-quality instructional 
materials and practices that could help students build the skills and knowledge they needed 
to succeed on more rigorous state tests.°% 


In response, more states are getting involved in formative assessments for instruction 
by providing or recommending vetted assessment resources aligned to state standards. 


States are offering guidance, purchasing resources, or subsidizing the cost of a menu of 
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The demand and interest 
in higher-quality 
formative assessments 
for instruction among 
schools and districts 

is clear in the growing 
market for classroom 


assessments. 


[14] Bellwether Education Partners 


assessment options.** Some states’ increased activity around formative assessments for 
instruction kicked off with the optional interim assessments made available by the Smarter 
Balanced consortium.* Positive feedback around those resources may have generated 
interest in more state-provided low-stakes or no-stakes tests and tools.*¢ Additional 
resources around formative assessment for instruction can be especially useful in pre-K 


through second grade or in subjects where summative tests are not required by law. 


Benefits 


The demand and interest in higher-quality formative assessments for instruction among 
schools and districts is clear in the growing market for classroom assessments. In 2017, 
spending on classroom assessments reached nearly $1.6 billion nationally, compared to 
$1.3 billion on state-level mandatory tests.2” States may have more capacity than districts 
to differentiate among options in the market or design custom resources, evaluate quality 


and alignment with state standards, and subsidize higher-quality choices. 


If states offer these kinds of resources in addition to high-quality tests that can be used 

for accountability, it could help more teachers and school leaders improve instruction 

and understand college- and career-ready standards, ultimately helping more students 
reach those standards. It could also help build trust among state agencies, districts, and 
educators if the state provides resources around assessment that are exclusively meant for 


instructional, not accountability purposes. 


Risks 


States will need to strengthen their internal capacity to effectively do work that was 
historically outside of their purview. To address this concern, states should consider 
partnerships with local universities or professional associations that could enhance the 
design and implementation of formative assessment resources. States will also need to 
invest in training and professional development to help administrators and teachers put the 


new resources at their disposal to use in ways that will serve students. 


There is also a risk that educators will associate any test that comes from the state with 
high stakes and accountability, and thus mistrust it. This is especially salient if student data 
are collected in a computer-based format. These assessments will not achieve their goals 
if teachers and school leaders are resistant to using them. States can mitigate this risk 
through clear communication, outreach, and training about the intended purpose and uses 


of various assessments. 


Given tight budgets 

in some states, it is 
critical that investments 
in formative tests for 
instruction do not 
supplant efforts to 
maintain and improve 
other tests that can be 
used for public reporting, 
equity monitoring, and 


accountability systems. 


Given tight budgets in some states, it is critical that investments in formative tests for 
instruction do not supplant efforts to maintain and improve other tests that can be used for 
public reporting, equity monitoring, and accountability systems. These kinds of tests serve 


different, complementary purposes, and one should not come at the expense of the other. 


State Examples 
Center on Standards & Assessment Implementation Consortium 


The Formative Assessment Bi-Regional Advisory Board (FAB-RAB) is a consortium of seven 
states committed to incorporating formative assessment as part of their comprehensive 
assessment system.*® Through partnership with the WestEd/CRESST Center on Standards 
and Assessment Implementation, sponsored and facilitated by the Center and South 
Central Comprehensive Centers at the University of Oklahoma, participating states 
receive professional development for state leaders, discuss and resolve common problems 
associated with formative assessment implementation at the district and classroom level, 
create common regional resource materials, and collaboratively develop long-range 
implementation plans.°? Member states have initiated their own formative assessment 
advisory boards, and have focused on ways to embed formative assessment practices into 


instruction across their states. 
Michigan 


Formative Assessment for Michigan Educators (FAME)* is a joint effort of the Michigan 
Department of Education and the Michigan Assessment Consortium that trains Michigan 
teachers, coaches, and leaders in effective formative assessment processes.*t FAME not only 
offers trainings and resources, it organizes teachers and coaches into a system of “Learning 
Teams” that sustain the work of formative assessment in schools. Educators apply to the 
state, build a local “Learning Team,’ and receive initial and ongoing training on FAME’s model 
of formative assessment and peer support. Regional staff support the educator teams on an 


ongoing basis. Several research studies of the FAME model are underway.*? 
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Item banks are not a 

new development in 
assessment. But states 
are participating in these 
banks and collaborating 
on tests in new ways, 
especially as the number 
of states participating 

in consortia tests has 


dwindled. 
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Flexible Collaboration and Shared Item Banks 


nthe PARCC and Smarter Balanced models, states co-developed and shared whole 
assessments. But every test is composed of questions (also called “items”), each of which 
can cost hundreds or thousands of dollars to develop and validate. Instead of sharing 
whole tests, more states now mix, match, and share large groups of items via shared 
item banks. An item bank is a database of test items, which can be owned by a vendor, a 
nonprofit organization, a state, or a group of states. Item banks are not anew development 
in assessment. But states are participating in these banks and collaborating on tests in 
new ways, especially as the number of states participating in consortia tests has dwindled. 
Shared item banks are now common practice in large-scale assessment design, not only 
in math and reading, but in science, social studies, alternative assessments, and English 
language assessments. 


Federal law required that any items developed with federal dollars, including items 
developed under the RTT-A grants, be made “freely available to states, technology platform 
providers, and others that request it for purposes of administering assessments.’*? This 
does not mean the items created by the consortia are free, but RTT-A grants created a 
plethora of new, high-quality items aligned to college- and career-ready standards that 
were available to consortia and non-consortia members alike. As states sought out a way to 
create new tests of their own, using items from multiple shared sources in addition to state- 


created models was an appealing option. 


Beyond cost-sharing 
benefits, this model is 
cause for optimism 
because it can support 
higher-quality tests and 
interstate collaboration 
even as states shift away 


from the consortia. 


Benefits 


There are many benefits to cross-state collaboration and shared item banks, which is 

part of the reason why they have become common practice in state assessments. The top 
reasons to do so are high-quality items at lower cost, and more cross-state brainpower and 
idea sharing. Technical details such as the best practices for agreements among states and 


vendors involved in item banks are fairly well-established in the field. 


There are costs associated with purchasing items from an item bank, but these are 
generally much lower than making new tests from scratch because multiple users of an item 
bank can share costs. In some cases, states pay back to the item bank in kind by contributing 


their own items to be vetted and used by others. 


Beyond cost-sharing benefits, this model is cause for optimism because it can support 
higher-quality tests and interstate collaboration even as states shift away from the 
consortia. Shared item banks and hybrid test designs may also give states the ability to 
maintain stability and comparability in their scores over time, even if their tests change.“4 
For instance, if a former PARCC state uses enough PARCC items in its new test, it could 


establish links between the two tests that allow it to track growth trends over time. 


Theoretically, a similar approach could also be used for some kinds of cross-state 
comparisons, even if states are not administering the exact same tests. Over time, it is 
possible that these collaborations around items could evolve into more ambitious cross- 


state endeavors that would support new kinds of innovative thinking around test design.*° 


But states do not have to be aiming for comparability in order to participate in item bank 
collaborations. One advantage of this model is that states can maintain as much autonomy 


over test design decisions as they desire. 


Risks 


There are relatively few technical risks to participating in a shared item bank, as long as 
states set clear, high standards for item quality, and consider issues such as equitable 
cost sharing among states, test item security, and agreements on timelines for releasing 
items to the public. As explained above, because various kinds of shared item banks have 
quickly become common, most states and vendors have developed reliable strategies for 
addressing these questions in cross-state agreements and planning stages. These kinds of 


agreements were put to the test on a wide scale with PARCC and Smarter Balanced. 


States face bigger risks going it alone. Tennessee is one such cautionary tale. After the state 
legislature abruptly pulled out of the PARCC consortium, the state tried to quickly develop 
its own assessment system, TNReady. The state spent more than $130 million in hasty test 
development only to be plagued by years of technical problems and vendor failures in ways 


that negatively impacted the usability of their results.*¢ 
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The less predictable risks to cross-state collaboration are primarily political, especially 
if states are aiming for cross-state comparability in their results. If political winds shift, 


support for collaboration across state lines may shift as well. 


State Examples 
New Meridian 


In 2017, New Meridian, an assessment design and development organization, was selected 
as the authorized agent of the Council of Chief State School Officers (CCSSO) to maintain 
the item bank and test designs developed by the former PARCC states. States can license 
some or all of that content from New Meridian for their own tests, along with options 

for various kinds of advisory and test development support.4” New Meridian’s approach 
developed in direct response to states’ need for high-quality assessment content at a 
reasonable cost with autonomy around design decisions, especially among states leaving 
the consortia.*® So far many of New Meridian’s partner states are former PARCC states.*” 
States still have access to some of the advantages of a consortium (high-quality items at a 
reduced cost), and continuity in their test design and content, with the flexibility to make 
individual choices. New Meridian’s current focus is producing shorter and/or adaptive 


assessment designs for states that build upon the content and design of PARCC.°° 
Michigan and Smarter Balanced 


In 2014, the Michigan legislature required the Michigan Department of Education (MDE) 
to develop a new state test for spring 2015. As aresult Michigan rebranded Smarter 
Balanced ELA and math content while simultaneously developing new content aligned 

to state standards in one assessment called the Michigan Student Test of Educational 
Progress (M-STEP).°* The current assessment is a hybrid of performance tasks, classroom 
activity, and an online assessment that includes both Smarter Balanced and Michigan- 


developed content.*2 


Science and social 
studies content are both 
especially well-suited 

to performance-based, 
hands-on activities and 
tasks, which can be 
engaging for students 
and provide deeper 
information on learning 


to teachers. 


Science and Social Studies Exams 


cience and social studies assessments are ripe for state innovation and 

experimentation. Generally, state action in these content areas requires a committed 

state leader or other driving force because they are mostly optional, costly, and 
usually do not count towards things like accountability ratings. But these subjects could be 
a laboratory of assessment innovation, with wide flexibility in law and subject areas that 
have been somewhat neglected in the era of math and reading assessments. Science and 
social studies content are both especially well-suited to performance-based, hands-on 
activities and tasks, which can be engaging for students and provide deeper information on 


learning to teachers.°? 


Science and social studies assessments have different policy contexts from math, reading, 
and each other. ESSA requires science exams once in each grade band.** ESSA does not 
require assessments in social studies, but 33 states offer some kind of social studies 
assessment. Neither science nor social studies assessments are required components of 


state accountability systems, although some states choose to use them. 


Some states have cut back non-federally-required exams in these subjects in response to 
over-testing pressure. But advocacy groups, educators, and experts in these subject areas 
often want some kind of comparable assessment because it affirms the importance of the 
subject area, provides valuable performance and progress data, and helps protect against 
cutbacks in time or resources in favor of tested subjects.* This also has equity implications: 
Students in low-income schools and students of color are disproportionately likely to 


experience cuts to science and social studies resources. 
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Science 


States have not traditionally invested much time and resources into their science tests 
compared with math and reading, a challenge that is compounded by many states’ adoption 
of new, more rigorous and complex science standards in recent years.*” 40 states have 
adopted new science standards in the past five years, inspired in large part by the influential 
“Framework for K-12 Science Education,’ released in 2012 by the National Research 
Council.?® Following the release of the Framework, the National Academies of Science, 
science educators, and Achieve collaborated to develop the Next Generation Science 
Standards (NGSS).°? 19 states have adopted the NGSS, while 21 states have adopted 
standards inspired by the Framework. 


Figure 1 K-12 Science Standards Adoption 


Hi Adopted NGSS Mi Standards Based on the Framework Non-Framework Science Standards 


Source of Standards Adoption Information: NGSS, https://ngss.nsta.org/About.aspx. 


[20] Bellwether Education Partners 


There are signs that some 
states are seeking to create 
better science tests, and 
investing in instructional 
supports aligned to the 


rigor of new standards. 


The NGSS, and other Framework-based science standards, differ in many ways from prior 
state standards. NGSS emphasizes learning and applying science concepts in the context of 
real-world scientific phenomena and events.*! The defining feature of the NGSS is its three 
interlocking core components: Disciplinary Core Ideas,°* Cross Cutting Concepts, and 
Science & Engineering Practices.** Each NGSS standard includes all three core components, 


creating a “three-dimensional standard.” 


For example, a third-grade standard in the NGSS asks that students “obtain and combine 
information to describe climates in the world.’® This single standard includes a science and 
engineering component on obtaining and evaluating information from various sources, core 
ideas about understanding weather and climate, and a cross-cutting concept that patterns 


in nature can be observed and used to make predictions. 


It is possible to reliably assess a standard like this in a test format that stays true to the 
intent of the standards. After all, math and ELA standards are also complex, but methods for 
test and item design aligned with rigorous new standards are now fairly well established. 
But science tests have not had comparable levels of investment and time from states, and 


have not played the same role in accountability systems. 


There are signs that some states are seeking to create better science tests, and investing in 
instructional supports aligned to the rigor of new standards. State efforts in science include 
cross-state collaboration on test items and design, hands-on tasks that can be used in class, 
and instructional resources that lie outside the summative test. Science tests could get 
anew infusion of support from the U.S. Department of Education: 2019 grants will favor 
proposals with a Science, Technology, Engineering and Math (STEM) focus, especially in 
computer science.® Thus far, computer science assessments have mostly been the purview 
of the AP, but the incentives of available federal funding could galvanize states that have 


already demonstrated interest and investment in this subject area. 


Benefits and Risks 


The primary tradeoffs in science assessment revolve around cost, technology, and 
comparability. Technology could enable more ambitious, interactive assessments that use 
computerized simulations to create performance-based science tasks.°” The more realistic 
and rich simulations are also expensive to develop and complex to score. Tests that meld 
the full interactive capabilities of modern video games, the depth of the science standards, 
and the reliability of a traditional standardized test are likely still several years in the future; 


however, the components to build them exist today. 


Live, performance-based tasks in the classroom are also an attractive option for science 
assessments, but these would still require research and careful training for teachers in 


order to yield data that could be useful, valid, and reliable for accountability and instruction. 
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State Examples 
New Hampshire 


New Hampshire has been a leading innovator in comparable, performance-based 
assessments through its Performance Assessment of Competency Education (PACE) 
system, which spans math, reading, and science. PACE began more than five years ago, 
but was recently approved under ESSA’s IADA pilot. PACE integrates locally designed 
performance assessments with more traditional tests in some grade levels. In science, 
students take performance assessments in grades three, four, six, seven, and eight. In fifth 
grade, students take a summative science test. In high school they take both a summative 
test and course-specific performance assessments. New Hampshire’s model demonstrates 
both the opportunities and the challenges of integrating live performance-based tasks 
into an assessment system for accountability purposes. The state has not scaled PACE 
statewide, but it has developed a tiered system by which districts can gradually transition 
into PACE, and plans to expand further in future years. 


Kentucky 


Kentucky was an early adopter of NGSS and one of the first states to publicly pursue new 
science assessment development. Kentucky’s approach combines elements of interim test 
design with performance-based tasks. As an example, students in high school are asked to 
analyze the data from trials of a test dummy colliding with various airbag designs, and then 
prepare a written argument for which airbag provides the most protection.®’ The intent of 
this design is to allow students to demonstrate standards skills and knowledge in class, as 
they learn the material, often at a deeper level than is possible on a traditional standardized 


assessment.” 
OpenSciEd 


OpenSciEd is a collaborative effort among 10 states, curriculum developers, and science 
experts to create an open-source curriculum aligned to the NGSS and the Framework, 
answering the demand for high-quality instructional materials.’* The ultimate results will 
likely include integrated formative assessments and other kinds of instructional resources 


that might inform the shape of better summative science tests. 


\ 


Social Studies 


Unlike science, social studies is not required by ESSA, and has not gone through the same 
transformative multistate standards experience. Despite the significant number of states 
with a social studies test requirement of some sort, innovation and investment seem to be 
largely absent, which is a major missed opportunity for those who want to lift up the value 
of social studies instruction, and encourage progress and equity in social studies. 


Table 1 ) Social Studies Assessments in States ” 


U.S. Citizenship Test 


K-8 Assessment U.S. History & Government 


(required to receive HS diploma) 


8 States 


13 States 12 States 


AL, AZ, AK, ID, MN, ND, TN, UT CO, DE, GA, IN, KS, KY, FL, GA, KY, LA, MI, MO, 


LA, MI, SC, TN, TX, VA, WI NY, OH, OK, SC, TN, TX 


Without accountability 
pressures, social studies 
could be a testing 
ground for new kinds of 
performance-based tasks, 
growth models, or class- 


embedded assessments. 


Recent trends in social studies assessment have mostly been to cut state-developed tests, 
and in eight states, to introduce the U.S. Citizenship Test as a high school requirement. 
The citizenship test trend was backed in part by a major push by the Civics Education 
Initiative, whose leaders were concerned by the de-emphasis of civics education in 
schools and Americans’ low level of basic knowledge about government and civic 
engagement.” Social studies teachers and advocacy groups have pushed back against 
this trend, claiming that the test is not a high-quality representation of the skills and 
knowledge needed to develop a comprehensive view of civics. The citizenship test only 
measures students’ ability to memorize discrete facts, and does not require meaningful 
engagement with the depth or rigor of social studies content standards. Others have 


argued that the test contains factual errors.’4 


Benefits and Risks 


Without accountability pressures, social studies could be a testing ground for new kinds of 
performance-based tasks, growth models, or class-embedded assessments. However, these 
kinds of tests may not be able to serve the same uses for tracking performance publicly and 
understanding achievement gaps as tests that are subject to more stringent requirements. 
A high-quality in-class social studies assessment might support better learning outcomes 
for teachers who choose to use it, but there is no way to say for sure whether students are 


progressing without any public reporting. 
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State Examples 
Louisiana and Washington 


There are a handful of states doing interesting things with their social studies tests. As 
described earlier in this brief, Louisiana is piloting an innovative assessment that integrates 
social studies content with ELA skills. Washington is another notable state in social studies 
test development, but its tests exist in a gray area: Students must take social studies 
assessments in several grades, but districts may choose to use the state tests or develop 
their own, which do not have to be comparable. The state’s test is a multistep project-based 
social studies and civics assessment aligned to state standards, designed to push students’ 
critical thinking skills.”> These assessments also have connections to Washington’s English 
Language Arts standards so that teachers are encouraged to integrate reading and writing 
tasks in social studies instruction, and vice versa.” The state legislature recently passed a 
citizenship test requirement in addition to existing assessments, which will take effect in 


2020 as part of a broader emphasis on civic inquiry, engagement, and action. 


Science and Social Studies Assessment in the Future 


Both science and social studies assessments are ripe for innovation and improvement. 

If policymakers are frustrated that their assessment ideas — be it performance-based 
assessments, curriculum-embedded tests, or locally designed assessment — are not 
immediately feasible for math and reading, the field of possibility for science and social 
studies is wide open. Success designing new kinds of tests in these topic areas could also 
provide an evidence base to support similar changes in math and reading, and encourage 
new kinds of deep thinking about the relationship between high-quality assessment and 


high-quality instruction. 


Some of these states may 
be at risk of compromising 
test quality in the name 
of cutting test time or 
responding to political 


pressures. 


Conclusion 


cross the topic areas above, states tended to fall into one of three camps. A small 

number of trailblazing states are making big, public plays around innovation 

in assessment. This group includes states that are out front in the innovative 
assessment demonstration pilots, and those aggressively pursuing investments in areas 
such as interim assessments for accountability. These states are committing significant 
resources to new models and ideas, and pushing the boundaries of what a state assessment 
system could look like. 


The second group of states are shifting their role in assessment more gradually, and 
considering new ways to simultaneously advance equity, transparency, and teaching and 
learning. This group is also exploring new ways for states to engage in assessment and 
curriculum outside of their traditional lanes, although change has been slower and less 
consistent across topic areas. The trailblazers are few in number, but the second group, 
the shifters, may be larger than we know, suggesting a somewhat brighter future for state 
assessments than public discourse might suggest. 


A third group of states are in retreat on assessment, reducing any investment to the barest 
minimum. Some of these states may be at risk of compromising test quality in the name of 
cutting test time or responding to political pressures. This could leave students worse off, 
with less accurate information and transparency around equity gaps and long-term college- 
and career-readiness goals, and reverse the progress made over the past decade. More 
ambitious states may serve as models and proof points to reverse this trend. 
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Greater attention to a positive role for assessment in instruction alongside accountability 
and equity purposes for testing could help avert the backlash seen in earlier efforts. More 
importantly, this renewed focus on state supports for standards-aligned instruction could 
help fulfill the promise of more rigorous college- and career-ready standards: that if 
states set a higher bar, more students will reach it. State movements towards innovative 
assessments parallel growing state involvement in encouraging high-quality curricula, 
facilitating more effective forms of instructional coaching and professional learning, and 
other encouraging efforts to translate states’ goals for students into classroom-level 


supports and change. 


The specific topic areas explained above intersect with one another in unexpected 

ways. We see examples of science assessments created by informal consortia, interim 
assessments for accountability that fold in formative and instructional resources, and 
innovative assessment proposals that bring interdisciplinary classroom instruction to the 
forefront. Over $370 million in competitive grants that the federal government plans to 
award in summer 20149, with priority areas that intersect with many of these ideas, could 


also reshape the field. 


Of course, not everything states could do is something they should do. When millions of 
dollars and precious student learning time are at stake, states must weigh the potential 
risks of change. Stability may be especially valuable among states that have experienced 
three or more assessments in as many years. New innovations and ideas, especially in areas 
of assessment that are not in states’ typical responsibilities, should not jeopardize previous 


progress on accountability, equity, and transparency. 


This work can be extremely difficult. States embarking on this path should have clear goals, 


a theory of action centered on equity and student learning, and realistic expectations for 


The opportunities for the substantial resources, capacity, and time it takes to research, develop, implement, 
assessment improvement and sustain a comprehensive system of high-quality assessments. They also should not 
Giailable ander BISA. sik underestimate the importance of proactive, clear communications with districts, educators, 


muiomuaniae families, and political leaders outside of education. 


permission slip, should The opportunities for assessment improvement available under ESSA, with or without 
be something that states any federal permission slip, should be something that states seriously consider. States are 
seriously consider. building their goals, strategies, and accountability systems on assessments, and using them 


to guide instructional decisions or factor in to graduation and student promotion. Seeking 
continuous improvement and innovation in this area is important and urgent because of the 
ongoing impact and importance of high-quality assessment systems for students, teachers, 


and our educational system as a whole. 
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